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^ PATENT 

MUSIC SEARCHING METHODS BASED ON HUMAN PERCEPTION 

1.Q BACKCfROUNO 

Modern computers have made possible the efficient assemblage and searching 
of large databases of Infomialion. Text-based infonnation can be searched for key 
words. Until recently, databases containing recordings of music could only be searched 
via the textual metadata associated with each recoiling rather than via the acoustical 
content of the music Itself. The metadata Includes ftifomiation such as title, artist, 
duration, publisher, classification applied by publisher or others, instrumentation, and 
recording methods. For several reasons it is highly desirable to be able to seard, the 
^ content of the music to find music which sounds to humans like other music, or w,hich 
W 15 has more or less of a specified quality as perceived by a human than another piece of 
music One reason is that searching by sound requires less know/ledge on the part of 
C the searcher they don't have to know, for example, the names of artists ortltles. A 
^ second reason is that textual metadata tends to put music Into classes or genres, and a 
^ search in one genre can limit the discovery of songs from other genres that may be 
^ 20 attractive to a listener. Yet another reason is that searching by the content of the music 
jJ allows searches when textual infomiation is absent, inaccurate, or inconsistent. 

A company called Muscle Rsh LLC in Berkeley. Califomia has developed 
computer methods for classification, search and retrieval of an kinds of sound recordings. 
These methods are based on computationally extracting many tiarametere' from each 
sound recording to develop a vector, containing a la.ge number of data points, which 
characteristically describes or represents the sound. These methods are described in a 
paper entitled Classification. Search, and Retrieval of Audio by Erling Wold, Thorn Blum, 
Douglas Keislar. and James Wheaton which was published in September 1999 on the ' 
Muscle Rsh website at Musclefish.com. and in US Patent 5.918.223 to Blum eta! 
entitled 'Method and article of manufacture for content-based analysis, storage, retrieval, 
and segmentation of audio infomtiaaon." 

The Blum patent describes how the authors selected a set of parameters that can 
be computationally derived from any sound recording with no particular emphasis on 
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lusic. Data for each parameter is gathered over a period of time, such as two seconds, 
le parameters are well known in the art and can be easily computed. The parameters 
include variation In loudness over the duration of the recording (which captures beat 
Information as well as other information), variation in fundamental frequency over the 
5 duration of the recording (otten called "pitch"), variation in average frequency over the 
duration of the recording (often called "brightness'O.and computation over time of a 
parameter called the mel frequency cepstrum coefficient (MFCC). 

Mel frequency cepstra are data derived by resampling a unifomily-spaced 
frequency axis to a mel spacing, which is roughly linear below 100 Hz and logarithmic 
10 above 100 Hz. Mel cepstra are the most commonly used front-end features in speech 
recognition systems. While the mel frequency spacing is derived from human 
perception, no other aspect of cepstral processing is connected with human perception. 
The processing before taking the mel spacing involves, in one approach, taking a log 
Q discrete Fourier transfomn (DF7) of a frame of data, foflowed by an inverse DFT. The 
1^ 1 5 resulting time domain signal compacts the resonant infbnnation dose to the t=0 axis and 
pushes any periodicity out to higher time. For monophonic sounds, such as speech, this 
O approach is effective for pitch tracking, since the resonant and periodic infomiation has 
H little overlap. But for polyphonic signals such as music, this separability would typically 
o not exist 

^ 20 These parameters are chosen not because they correlate closely with human 

W perception, but rather because they are well known and, in computationally extracted 
Q form, they distinguish well the different sounds of all kinds with no adaptation to 
O distinguishing different pieces of music. In other words, they are mathematically 

distinctive parameters, not parameters which are distinctive based on human perception 
25 of music. That correlation witfi human perception is not deemed important by the Blum 
auUiors is demonstrated by their discussion of the loudness parameter. When 
describing tf^e extraction of the loudness parameter, the authors acknowledge tiiattfie 
loudness which is measured mathematically does not con-elate with human perception of 
loudness at high and low frequencies. They comment tiiat the frequency response of the 
30 human ear could be modeled if desired, but. for the purposes of tiieir invention, there is 
no benefit. 

In the Blum system, a large vector of parameters is generated for a 
representative sample or each section of each recording. A human will then select many 
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0^ volume. K,sw^, Known that the best .odelsf^^ 

apply a compensating atgorithm based on frequency. Of cou.e. loudness is not a 
m^.n9a.ldescHptorofm^^^^ 

volume or lo. volume and original music can be recorded at high volume or low vLe. 
Figure 1 shows the prior art method described by Blum. 



10 Rgure 2 describes the method by which Blum would find 



sounds that sound alike. 



Figure 3 is an illustration of the current Invention as it is used to create a 
database of descriptors of music 

O 

wis Figure 4 Illustrates a method for creating and searching a database. 

m 

I Figure 5 Is an example of an interface used to Interact with a database. 

m 

S20 nerc- ' ""^^ '° that humans 

^20 perceive as sounding alike using weighted parameters 

ru 

I Figure 7 Hlustmtes how the current invention is used to find music that humans 

Q percenre as sounding alike using weighted descriptors. 



a 
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Figure 8 is a method for detemtfning the perceptual salience of a parameter. 

4.1 Prinri,,^ 

in Houri' T.T"'^ o^^tegorfeing sounds, as described by Blum, is illustrated 

he sounds are decomposed using digital signal processing (DSP) techniques that are 
known .n the art to create parameters 103 such as loudness, brightness, fundamental 
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||equency. and cepstnim. These parameters are stored as multi-dimensional vectors in 
an n-dimenslonal space of parameters 104. 

In order to find sounds that are alike, the procedure iOustiated in Rgure 2 is used. 
In Phase 1 . a human selects a sound, step 202. from a database of stored sounds 201 . 
5 that falls into a category of sounds. For example, 'laughter.- This is the target sound. ' 
The target sound is fed into a parameter extractor 204 to create a number of parameters 
205, with a large number of data points over time for each parameter. The parameters 
flw) the sound are then stored as a vector in an n-dimensional space 206. 

It is assumed that all sounds within, or close to. the area describing laughter all 
10 sound like laughter. One way of exploring this is shown in Phase 2 of Rgure 2. The 
parameter values of the target sound are adjusted, step 207. A new sound is played, 
step 208. which corresponds with the adjusted parameter values. A human listens to the 
new sound and detemiines whether or not the new sound is perceptually similar to the 

0 target sound, step 209. If it is not. branch 210. the parameters are again adjusted, step 
U, 1 5 207. unBI a similar sound is found, or there are no more sounds. If a similar sound Is 

1 found, then the parameter values of that sound are used to detemiine an area of similar 
□ sounding sounds. 

I The prior art puts sounds into classes. This classification is binary. Bther 

. ■ something is like the target class or it is not. There is an assumption that mathematical 
^20 distance of a parameter vector from the parameter vectors of the target class is related 
m to the perceptual similarity of the sounds. Blum claims that this technique works for 
^ . cataloging transient sounds. However, for music, it is unlikely that the relationship 
O between two parameter vectors would have any relevance to the perceptual relationship 
of the music represented by those vectors. This is because the parameters bear 
25 Insufficient relevance to the human perception of music. Also, music cannot be 
adequately represented with a binary classification system. A music classification 
system must account for the fact that a piece of music can belong to one of several 
classes. For example, something may be country with a rock influence. Also, a 
classification system must account for the fact that a piece of music may be a better or 
30 worse example of a dass. Some music may have more or less elements of some 

descriptive element. For example, a piece of music may have more or less energy, or be 
more or less country than another piece. A set of scalar descriptors allows for more 
comprehensive and inclusive searches of music. 



7 



4.2 Current Invention. modi>Hp > | f ftyy^riptors {Ffg^r^c -t n) 

4.2.1 Overvr ewofo pBrgtityn 
5 The invention described here is illustrated in Rgure 3. A database of stored 

music 301 is played to one or more humans, step 302. who rate the music on the 
amount of one or more descriptors. The same music is fed into a parameter extractor 
303. that uses methods Imown in the art to extract parameters 304 ttiat are relevant to 
the perception of music, such as tempo, rhythm complexity, itiythm strength, brightness. 
10 dynamic range, and harmonidty. Numerous different methods for extracting each of 
these parameters are laiown in the art. A model of a descriptor 305 is created by 
combining the parameters with different weightings for each parameter. The weightings 
may vary with the value of the parameter. For example, the parameter "brightness" may 

0 contribute to a descriptor value only when it is above a threshold or below a threshold or 
Jl5 wHhinarange. The model is refined, step 307. by minimizing the difference, calculated 
m in step 308. between the human-derived descriptor value and the machine-derived 

2, value. 

m The objective of using human listeners is to create models of their perceptual 

1 processes of the descriptors, using a subset of ail music, and to apply that model to the 
02D categorization of the set of all music. 

m 

^ 4.2.2 Limitations of the modetft 

p The limit on the goodness of the fit of the model (its predictive power) is determined by. 
among other things: 

^'^^ 'I'he variability between the human responses, step 302: High variability 
means that people do not agree on the descriptors and any model will 
have poor predictive power 

(2) The intra-song variability in the set of alt music: High variability within a 
song means than any one part of the song will be a poor representation of 
3ny other part of the same song. This will Impede the listeners' task of 
judging a representative descriptor for that song, resulting in less accurate 
data and a less accurate model. 



8 



10 



(3) How well the subset represents the set: The ability to oBate a model 
depends on the existence of patterns of parameters in similar songs If 
the subset on which the model is based does not represent the patterns 
that are present in the set. then the model will be incorrect 
We can improve the perfoimance of the model by: 

(a) Choosing descriptors for which there is low inter-rater variability Our 
critenon for selecting descriptors is a minimum correlation coefficient (r) of 0.5 between 
the mean from at least 5 human raters, and the individual scores from those raters. 

(b) Applying the technique to music which has low intra-song variability 
(e.9. some Classical music has high variability within a song). One method for 
detem,ining mtra-song variability is to extract parameters from a series of short 
contiguous samples of the music. If mo,B than haff of the parameters have a standard 
dev,at.on greater than twice the mean, then the song is classified as having high intra- 

0 song variabil«y. SlmUarly. if mo,B than half of the mean parameters of a song He mo,B 
|15 han 3 standard deviations from the mean of the population, that song is classified as 
gj having h>gh intra-song variability. This excludes less than 1% of all songs 

g ^''^ "^'"9 ^»^«««cal sampling techniques known in the art (forexample. for 

1 P^'Wcal polling) to ensure that the subset reprints the set. " 

A2.3 CollectinohiimaT^ff^f^ 

I ;^«P-'^"*'''"«»'°dfo'-colleCnghumandataistouseapanelofear-pickers 
who,,stenoeachsongandasce,tainthequan,ityofeachofsevera.descriptors.usinga 
L.ckertscale.whichiswellknownintheart. For example, nine ear-pickers are given a 
^5 rating scale with a list of descriptors, as shown below: 



Rating scales 



9 



Energy: How much does the song make you want to move or sing? 

123456 789 

Little Some A lot of Very much 

movement movement movement movement movement 

Vev light Light Medium High energy Very high 

energy energy energy energy 

2. Rhythm salience: What is the relative contribution of the rhythm to the overall sound 
5 of the song? 



O 

m 

m 
m 
a 

fflio 
m 

a 
a 



1 2 3 

No rhythmic Very little 
component rhythm 



4 5 
Moderate 
rhythm 



7 

A lot of 
rhythm 



8 9 
The song is 
all rhythm 



3. Melodic salience: What Is the relative contribution of the melody (lead singer/lead 
instrument) to the overall sound of the song? 



12 3 
No melody Melody is 
not too 
important 



4 5 6 7 
Melody is Melody is 
moderately quite 
important important 



8 9 
The song is 
all melody 



Tempo: Is the song slow or fast? 



1 

Extremely 
slow 



2 3 4 5 6 7 
Pretty slow Moderate Pretty fast 
tempo 



8 9 
Extremely 
fast 
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0 Density How dense is the song? How many sounds 



per second? 



Not at all 
dense 



2 3 
A little 



5 6 7 8 9 

Moderate Falrlydense Extremely 



6- Mood (happiness): What is the overaU mood 



^ 2 3 4 5 

Extremely Pretty sad 
sad 



Neither 
happy nor 
sad 



or emotional valence of the song? 

6 7 8 9 

Fairly happy Extremely 
happy 



m 

m 

qIO 

m 
m 

u 

O 
■P 

m 
o 



THese data are analyzed, and only those descnpiors for which there is high inter- 
rater agreement (low variat,l,ity) are used In the development of the system 
--nple. the correlations between the mean ratings and the mean of the Indlvidua. 

0 s H r ^' '''^ values 

mid ""^^^^^ 
..od.csal,ence.andrhy.hmlosa.ienceareallac.^^^^^ 



Mean Subject 
rating for 



with subject 
101 

102 

103 

104 

105 
106 
107 
.108 
201 



Pearson con*eIation 



ANGER 



0.76 

0.73|" 
0.65 " 
JL66" 
0.75 " 
0.70 " 
0.69" 
0.63" 



DENSITY 



.0»40 
0.74 
0.88 
0.84 
0.89 



0.68 

0.74 
0.68 
0.71 
0.69 
0.75 
0.71 



ENERGY 



0.88 

~0.83 
0.85 
0.77 
0.85 
0.88 
0.81 
0.89 



HAPPY 



0.83 

^086 
0.77 
0.66 

_0.72 ' 
0.83 
0.59 



MELODIC 
SALIENC 
E 



0.67 



0.67 
0.72 
^^072) 
0.14 
0.39 
0.57 



RHYTH 
M 

SALIEN 
CE 



0.73 

_0.82 
0.30 
0.63 
0.37 
0.87 
0.74 
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m 
m 
m 
□ 
m 

mi5 

s 

D 

m 

Q 



20 



Mean 
"on-elation 

values 


0.70 


0.76 


0.73 


0.84 


0.76 


0.54 


0.68 


Standard Error 
of Mean 


0.03 


0.06 


0.02 


0.01 


0.03 


0.06 


0.07 
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Another method uses music with known quantibes of some descriptor as defined 
by the purpose to which ft is put by the music-buying public, or by music critics Our 
technique rank orders the recommended songs by their quantity of the descriptor, either 
usrng the values that come with the music, or using our panel of ear-pickers. For 
example, www.jamaicans.com/eddyedwards features music with Caribbean riiythms that 
are high energy. Amazon.Com features a mood matcher in which music critics have 
categorized music according to its uses. For example, for line dandng they recommend 
the following 

Wreck Your Life by Old 97*8 

The Best Of Billy Ray Cyrus by Billy Ray Cyms 

Guitars. Cadillacs. Etc.. Etc. by Dwight Yoakam 

American Legends: Best Of The Early Years by Hank Williams 

Vol. 1-Hot Country Hits by Mcdaniel. et al 

Another method uses professional programmers who create descriptors to create 
programs of music for play in public locations. For example, one company has 
drscovered by trial and error the type of music they need to play at different times of the 
day to energize or pacify the customers of locations for which they provide the ambient 
music. Theyusethedescrlptor"ener9y.»ratedonascaleof1(tow)to5(high) We 
used 5 songs at each of 5 energy levels and in 5 genres (125 songs total), extracted the 
parameters and created a model of the descriptor "energy- on the basis of the 
company's human-derived energy values. We then appHed that model to 125 different 
songs and found an 88«/o match between the values of our machine-derived descriptors 
and the values from the human-derived descriptore. 
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4,2.5 Modelipg 

The preferred method for representing each descriptor uses generalized linear 
5 models, known in the art (e.g. McCullagh and Nelder (1989) Generalized Linear Models. 
Chapman and Hall). For example, the preferred model of "energy' uses linear 
regression, and looks like this: 

(1 ) Energy = + P,*Harmonlcity + p2*DynamicRange + fc^Loudness + 
^ ° p4*RhythmComplexlty + B5*RhythmStrenglh 

The preferred weighting values are: 
Pd = 4.92 

0 fe= -45.09 
ifl15 P3 = -7.84 

1 p4 = 0.016 
D & = 0.001 

m 

jL. '^^^ prefenned descriptor model for "happiness ** Is: 

J20 (2) Happiness = Po + p,*Articulatlon + Pj2*Attack + p3*NoteDuration + 

P p4*Tempo + P5*DynamicRangeLow + p6*DynamicRangeHigh + 

P p7*SoundSal}ence + Pg (Key) 

^ The prefen-ed weighting values are: 
Po = 6.51 

25 R=:-4.14 
R2=8.64 
Pa = -15.84 
P4= 14.73 

P5 = 6.1 

30 p, = -8.7 

p7 = 11.00 

Pb = 0 if no key; 10 if minor keys; 20 if major keys 
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It is likely possible to improve each model by adjusting the weighting values ft, to 
ft. so that they vary with the input value of the parameter or using cMferent exlracUon 
methods for one or more parameters or adding other parameters to the step of extracting 
5 parameters. 



Another method of optimizing a descriptor model involves using non linear 
models. For example: 
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(3) Energy = F(p,*tempo(ft*tempo + fc'sound salience}) 
where F is the cumulative normal distribution: 
F(x) = 1/^7ia» / " e dx 

S = the standard deviation of x 

IR ti = themeanofx 

|15 Thevaluesofft aresetto 1.0. Other values will be substituted as we develop the 

O process. 

m 
m 

^ Yet another method involves using heuristics. For example, if the beats per 

J minute value of a song is less than 60. then the energy cannot be more than some 
nJ20 predetermined value. 

O 

The output firom each descriptor model is a machine-derived descriptor 308. 
Several such descriptors for each song are stored in a database 309. The presently 
prefen-ed descriptors for use in the preferred system are: 
Energy 
Tempo 

IMood (happiness) 
Mood (anger) 
Danceability 

Once the models of the descriptors have been created, using a subset of all 
available music 301. they are applied to the classification of other music 311. Tests are 
conducted to determine the fit between the descriptor models derived from the subset of 
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^pjnusic and the other music, by substituting some of the other music 31 1 1nto the process 
beginning with the parameter extractor 303. As new music becomes available, a subset 
of the new music is tested against the models by placing it into the process. Any 
adjustments of the models are applied to all of the previously processed music, either by 

i reprocessing the music, or, preferably, reprocessing the parametere originally derived 
from the music. From time to time, a subset of new music is placed into the process at 
step 301. Thus, any changes in the tastes of human observers, or in styles of music can 
be measured and accommodated in the descriptor models. 



10 4.2.4 Parann eter extractors 



The following parameter extraction methods are preferred: 

O Hqrrnonictty: Harmonicity is related to the number of peaks in the frequency 

1 5 domain which are an Integer Multiple of the Fundamental (IMF) frequency. The 

harmonicity value is expressed as a ratio of the number of computed IMFs to a maximum 
IMF value (specified to be four). Harmonicity values H are computed for time windows of 
length equal to one second for a total of 20 seconds. Mean and standard deviation 
values are additional parameters taken over the vector H. 
y20 

p Uotidness : Loudness is defined to be the root mean square value of the song 

g signal. Loudness values L were computed for time windows of length equal to one 
□ second for a total of 20 seconds. 

25 Pynamic Range: Standard deviation of loudness for 20 values, calculated 

1 /second for 20 seconds. 



m 
m 



m 



Rhythm strength: Rhythm strength is calculated in the same process used to 
extract tempo. First, a short-time Fourier transfomn spectrogram of the song Is 
30 perfomied, using a window size of 92.8 ms (Manning windowed), and a frequency 

resolution of 10.77 Hz. For each frequency bin in the range of 0-500 Hz, an onset track 
is formed by computing the first difference of the time history of amplitude in that bin. 
Large positive values in the onset track for a certain frequency bin con^spond to rapid 
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^nsets in amplitude al that frequency. Negative values (corresponding to decreases in 
amplitude) In the onset tracks are truncated to zero, since the onsets are deemed to be 
most important In determining the temporal locations of beats. A oorrelogram of the 
onset tracks Is then computed by calculating the unbiased autocorrelation of each onset 
5 track. The frequency bins are sorted in decreasing order based on the activity In the 
correlation function, and the twenty most active correlation functions are further analyzed 
to extract tempo information. 

Each of the selected congelation functions is analyzed using a peak detecUon 
algorithm and a robust peak separation method in order to determine the time lag 
1 0 between onsets In the amplHude of the corresponding frequency bin. If a lag can be 
identified with reasonable confidence, and if the value lies between 222 ms and 2 
seconds, then a rhythmic component has faeen detected in that frequency bin. The lags 
of all of the detected components are then resolved to a single lag value by means of a 
a weighted greatest common divisor algorithm, where the weighting is dependent on the 

1 5 total energy in that frequency bin, the activity in the congelation function for that frequency 
m bin. and the degree of confidence achieved by the peak detection and peak separation 
p algorithms for that frequency bin. The tempo of the song is set to be the Inverse of the 
resolved lag. 

1^ "n^e rhythm strength is the sum of the activity levels of the 20 most active 

|20 correlation functions, normalized by the total energy in the song. The activity level of 
m each correlation function is defined as the sum-of-squares of the negative elements of 
p second difference of that function. It Is a measure of how strong and how repetitive the 
O beat onsets arc in that frequency bin. 



RMhm Complexity : The number of rhythmic events per measure. A measure 
prototype is created by dividing the onset tracks into segments whose length 
corresponds to the length of one measure. These segments are then summed together 
to create a single, average onset track for one measure of the song, this is the measure 
prototype. The rhythm complexity is calculated as the number of distinct peaks in the 
30 measured prototype. 

ArtipulgtiQn is the ratio of note length, le the duration from note start to note end 
(L) over the note spacing ie the duration of one note start to tiie next note start (S). An 
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||L:S ratio close to 1.00 reflects legato articulation. Ratios less than 1.00 are considered 
^^taccato. 



Attack is the speed at which a note achieves its half-peak amplitude. 

5 

Note Duration Pitch is extracted by using a peak separation algorithm to find the 
separation in peaks In the autocorrelation of the frequency domain of the song signal. 
The peak separation algorithm uses a windowed threshold peak detection algorithm 
which uses a sliding window and finds the location of the maximum value in each of the 
1 0 peak-containing-regions in every window. A peak-containing-region is defined as a 
compact set of points where all points in the set are above the specified threshold, and 
the surrounding points are below the threshold. A confidence measure in the pitch is 
returned as well; confidence is equal to hannonicity. Pitch values P are computed for 

O time windows of length equal to 0.1 second. Changes of less than 10 Hz are considered 

Ifl15 to be one note. 

m 

m 

P Temm The tempo extraction technique is described m an earlier section on 

. P rhythm strength. 

V 

!|^° PV"9mic Range t.pw Standard deviation of loudness calculated over a duration 

rtl of 10 seconds. 

Q 

O Dynamic Rgngo Hloh standard deviation of loudness calculated over a duration 

of 0.1 seconds. 

25 

Sound Sa l ience uses a modified version of the rhythm extraction algorithm, in 
which spectral events without rapid onsets are identified. 

Kfijt Detemilnes the key by the distribution of the notes identified with the note 
30 extrector. 
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^^'^^ "^liliiiM ina the para meter extractp rQ 

Another method for optimizing the fit between the human-derived descriptor value 
and the machine-derived value Is to adjust the extractor perfomiance, step 310. This 
can be accomplished by using extractors in which some or all of the intemal values, for 
5 example the sample duration, or the upper or lower bounds of the frequencies being 
analyzed, are adjusted. The adjustment is accomplished by having an iterafive program 
repeat the extraction and adjust the values until the enror. step 306, between the human 
values and the machine values is minimized. 

It is important that the parameters extracted firom the music have some 
1 0 perceptual salience. This is tested using the technique illustrated in Figure 8. The 
parameter extractor 801 is tested, step 802, by visually or audibly displaying the 
parameter value while concurrently playing the music from which it was extracted. For 
example, a time history of the hannonicity values of a song, sampled every 100 ms. is 
displayed on a screen, with a moving cursor line. The computer plays the music and 
|J15 moves the cursor line so that the position of the cursor line on the x-axis is synchronized 
II with the music. If the listener or listeners can perceive the correct connection between 

□ the parameter value and the changes in the music then the parameter extractor is 

® considered for use in further model development, branch 803. If it fs not perceived, or 

not perceived con^ctly, then that extractor is rejected or subjected to further 
^20 improvement, branch 804. 

□ 4,3 Interacting with the databa^f* (figures d and fi) 

Q Once the database of descriptors 309 has been created, it can be combined with 

other meta data about the music, such as the name of the artist, the name of the song, 

25 the date of recording etc. One method for searching the database is illustrated in Rgure 
4. A large set of music 401 . representing "all music" is sent to the parameter extractor 
402. The descriptors are then combined with other meta data and pointers to the 
location of the music, for example URLs of the places where the music can be 
purchased, to create a database 405. A user can interrogate the database 405, and 

30 receive the results of that interrogation using an interface 406. 

An example of an interface Is illustrated in Figure 5. A user can type in textural 
queries using the text box 504. For example; "Song title: Samba pa tl, Artist: Santana." 
The user then submits the query by pressing the sort by similarity button 503. The song 
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^pj^Samba pa ti' becomes the target song, and appears in the target box 501. The 
computer searches the n dimensional database 405 of vectors made up of the song 
descriptors, looking for the smallest vector distances between the target song and other 
songs. These songs are arranged in a hit list box 502, arranged In increasing vector 
5 distance (decreasing similarity). An Indication of the vector distance between each hit 
song and the target song is shown beside each hit song, expressed as a percent, with 
100% being the same song. For example, the top of the list may be "Girt from Ipanema 
by Stan Getz, 85%." 

Another type of query allows a user to arrange the songs by profile. First, the 
1 0 user presses the sort by number button 509 which lists all of the songs in the database in 
numerical order. The user can scroll through the songs using a scroll bar 606. They can 
select a song by clicking on it. and play the song by double clicking on it Any selected 
song is profiled by the slider bars 506. These show the scalar values of each of several 
Q descriptors. The cun-ent prefen-ed method uses energy, danceability and anger. 
1^15 Pressing the sort by profile button 510 places the highlighted song in the target box 501 
and lists songs in the hit box 503 that have the closest values to the values of the target 

ffl Yet another type of query allows the user to sort by similarity plus profile. First a 

0 target song is chosen and the songs in the hit box 502 are listed by similarity. Then the 
^20 user performs a search of this subset of songs by using the slider bars 505. For 
f\j example, the slider values of the original target song are 5,6.7 for energy, danceabilty 
Q and anger respectively. The user increases the energy value from 5 to 9 and presses 
B the search similar songs button 507. The profile of the target song remains as a ghosted 

image 508 on the energy slider. The computer searches the subset of songs for a song 
25 with values of 9. 6 and 7. Songs with that profile are arranged in decreasing order of 

similarity. Songs without that profile are appended, arranged by decreasing similarity. 

The target song remains In the target box 501. A user can choose a new target song by 

clicking on a song in the hit list box 502. 

30 4.4 Modeling likeness with parameters (Figure 6) 

Figure 6 illustrates how the Invention is used to find music that sounds to human 
listeners like any musical composition selected by a user. The processes In Rgure 3 are 
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^peated to create a set of parameters 604. These are used to create a model of 
likeness 605 by a process descril)ed below. 



4.4.1 CollfentinQ human rtafg 

5 One or more humans listen to pairs of songs and judge their similarity on a scale, 

for example from 1 to 9, where 1 is very dissimilar, and 9 Is very similar, step 602. 

The objective of using human listeners is to create a model of their perceptual 
process of 'likeness", using a subset of all music, and to apply that model to the set of all 
music. 

10 The preferred method for coHectlng human data Is to use a panel of ear-pickers 

who listen to pairs of songs and score their similarity, using a Uckert scale, which Is well 
known in the art. 

Another method is to have people visit a web site on which they can Gsten to 
O pairs of songs and make similarity judgments. These judgments could be on a scale of 1 
jl|15 to 9. or could be yes/no. H is possible to estimate a scalar value of similarity based on a 
g large number of binary judgments using statistical techniques known in the art. 
Q 

p 4.4.2 Creat ing the likeness nfiode| 

The objective of the model is to predict these perceived differences using the 
|20 extracted parameters of music. To build the model, a list of numbers is calculated for the 
m comparison of each song to each other song. The list of numbers consists of a value for 
'q each parameter where the value Is the difference between the parameter value for a first 
D song and the value of the same parameter for a second song. When the model is used 
to compare one song to another for likeness, the list of parameter differences between 
25 the two songs is calculated and these differences are the inputs to the model. The 
model then yields a number that predicts the likeness that people would judge for the 
same two songs. The model processes the list of difference values by applying weights 
and heuristics to each value. 

The preferred method of creating a model of likeness is to sum the weighted 
parameter values. For example, for three parameters. A. B and C. the following steps 
are used for songs 1 and 2: 

STEP 1 - subtract the parameters of each song 
A< — A2, B| — 82, — Cj 
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^ STEP 2 - calculate the absolute differences. Our preferred method uses a value 
of n = 2. but other values can be used. 

nV(A,-As)", nV(B,-BJ". nV(C,-Cj)" 
STEP 3 - weight and sum the differences. The values of the weights (R- ftjare 
detennined by linear regression, as explained below. 

Likeness = p,*A,i„e™„„ + ?i*B,^ + %*C,^ 
The value of the weights are determined by a process of linear regression, step 
608. which seeks to minimize the difference, step 606. between the human-derived 
likeness values, step 602, and the output from the model. 605. The preferred model of 



+ 



10 likeness is 



(4) Likeness = ft, + p,*mean loudness + fe*rhythm strength + p»*tempo + 
^ p4*dynamic range + fe*mean brightness* pg'mean hamionicity 

,2 p,*rhythm complexrty + ^'standard devlafion brightness + 

1^ ^ ^ P»*standard deviatton hamioniclty 

P Where 

ffl ^ 

m p,=-o.io8 

g Pj= -0.225 

fc = -0.127 

ru 

y. . P4 = -0.015 

I fe = -0.296 

Pb = -0.223 

P7 = -0.122 
25 ft, = 0.277 

Pb = -0.074 

Another method for deriving likeness is to calculate the correlation coefficients (r) 
of the parameter values between each pair of songs in the database, and to create a 
30 matrix of similarity for the songs, with high correlation equating to high similarity. The 
parameter values are nomiafeed to ensure that they all have the same range. Song 1 
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Ipjrovides the parameters for the x values, and song 2 provides the parameters for the y 
values in the follauring formula: 

r = 2(x,- x„J (y,- y™,«J / VflS {X, -x^'][ £ (y, - y^^} 
5 Where x,^ and are the means of the nwrnaiized parameter values for 

songs 1 and 2. 

The rationale behind using con-elation coefficients is that if the parameters 
of two songs have a high posiOve correlation which is statistically significant then 
10 the two songs wBI be judged to be alike. 

4.4.3 Ornanizina and «f»r8^ ^ fhe rtaf^ 

The preferred method of storing and organizing the parameter differences data is 
as a multi dimensional vector in a multi dimensional database 607. The resulting matrix 
j| 15 contains n*(n-1)/2 cells where n is the number of songs. The model is used by starting 
W with a target song 609, calculating the difference in value for each parameter between 
□ the comparison song and the songs in the target database, steps 603 - 605, and then 
g applying the model to the difference values to anive at a value which represents the 

likeness between the comparison song and each song In the ta^et database. 
|20 An alternative method precomputes the 2 dimensional similarity matrix such that 

nj each song is connected with only those songs with which there is a match above a 
5 predetermined value. Thus, the low matching songs are culled from the similarity matrix. 
O TWs decreases the size of the database and can Increase the search speed. 

25 4.4.4 Limitations of f h.» mnAc.\ 

The limit on the goodness of the fit of the model (its predictive power) is determined by. 
among other things: 

(1) The variability between the human responses, step 602. High variability 
^ "isans that people do not agree on the descriptors and any model will 

have poor predictive power. 

(2) The intra song variabifity in the set of all music. High variablity writhin a song 
means than any one part of the song will be a poor representation of any 
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^ other part of the same song. This will impede the listeners' task of judging 

similarity between such songs, resulting in less accurate data and a less 
accurate model. 

(3) How well the subset represents the set. The ability to create a model 
5 depends on the existence of patterns of parameters in similar songs. If the 

subset on which the model Is based does not represent the patterns that are 
present in the set, then the model will be incorrect. 
We have found that a group of 12 human observers had a con-elation coefficient 
of 0.5 or greater In what they consider sounds alike. This indicates that there is sufficient 
1 0 inter-rater reliability to be able to model the process. We can further improve our 
chances of successfully using the model to predict what sounds alike by: 

(a) Only applying the technique to music which has low intra-song 
variability (e.g. some Classical music has high variability within songs). 

(b) Using statistical sampling techniques known in the art (for example, for 
11^ 1 5 political polling) to ensure that the subset represents the set. 

m 

m 

Q 4.5 Wodeiinq likeness w jth descrip tors (Fi gure 7) 

p Figure 7 Illustrates an alternative method for finding music that sounds to human 

e listeners like any musical composition selected by a user. The processes in Figure 3 are 
Q2O repeated to create a set of descriptors 706. These are used to create a model of 
nj likeness 707 by a process similar to that used to create the model of likeness using 
1^ parameters 605. 
S 

4A1 Collecting human data 

25 One or more humans listen to pairs of songs and judge their similarity on a scale, for 
example from 1 to 9» where 1 is very dissimilar, and 9 is very similar, step 702. The 
objective of using human listeners is to create a model of their perceptual process of 
"likeness", using a subset of all music, and to apply that model to the set of all music. 
The preferred or alternative methods of collecting human data described with Figure 6 

30 are used. 
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Creating the ( jkeness moder 

The objective of the model is to predict the perceived differences using the 
modelled descriptors of music. To build the model, a list of numbers is calculated for the 
comparison of each song to each other song. The list of numbers consists of a value for 
5 each descriptor where the value is the difference between the descriptor value for a first 
song and the value of the same descriptor for a second song. When the model is used 
to compare one song to another for likeness, the list of descriptor differences between 
the two songs is calculated and these differences are the inputs to the model. The 
model then yields a number that predicts the likeness that people would judge for the 
10 same two songs. The model processes the list of descriptor difference values by 
applying weights and heuristics to each value. 

The preferred method of creating a model of likeness is to sum the weighted 
descriptor difference values. For example, for three descriptors. A, B and C. the 
B following steps are used for songs 1 and 2: 
!jj 1 5 STEP 1 - subtract the descriptors of each song 

1 A,-Aa, B,-B2. C,-C2 

p STEP 2 - calculate the absolute differences. Our prefenred method uses a value 

2 of n = 2, but other values can be used. 
W I 

nV(A,-A2)", nV(Bi-B2)". nV(C,~C2r 

|p20 STEP 3 - weight and sum the differences. The values of the weights (ft - ft,) are 

ly determined by linear regression, as explained below. 

(3. Likeness = P,*A,«b^„co + 32*B,^„,, + ^*C^^ 

Q 

The value of the weights are detennined by a process of linear regression, step 
26 710, which seeks to minimize the difference, step 708, between the human-derived 
likeness values, step 702, and the output from the model. The preferred model of 
likeness is: 



(5) Likeness = ft, + pi*Energy + ftf*Tempo + fc*Happiness + p^^Anger + 
30 P5*Danceabllity 
Po = 0 
p,= 11.4 
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Q 



p2=0.87 

P4 = 6.3 
P5= 15.94 



The alternative methods of calculating likeness using conelations of parameters are also 
used with the descriptors. The preferred weightings of the descriptors are: 

Energy = 4.5 

Tempo = 2.1 
10 Happiness = 0.78 

Anger =0.55 

DanceaWlity = 3.7 

4.5.3 Organizing and Af orma the liki^ness data 

WIS The prefen-ed and alternative methods of organizing and storing the parameter 

g^. differences data for calculating likeness are also used when the process uses 
S descriptors. In addition, there is yet another alternative for calculating likeness. It 
g involves precomputing a series of hlerarchical listings of the songs organized by their 
^ descriptor values. For example, all of the songs in the database are organized Into nine 
|20 classes according to their energy values. Then all of the songs with an energy value of 9 
g are organized by their tempo values, then all of the songs with energy value 8 are 
O organized into their tempo values, and so on, through all levels of all of the descriptors. 
The sorting results In a maximum of L" similarity classes, where L is the number of levels 
of each descriptor and n is the number of descriptors. In this case, 59049 classes. The 
25 songs in each class have the identical descriptor values. This provides an alternative 
type of likeness. 

4.5.4 Limitatjpps of the mnrip; 

There are the same (imitations on the model based on descriptors as there are on 
30 the model based on parameters. 

4.5.6 Searching thft Likeneftft n^ti^ha.^^ 
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^ The likeness database may be quickly searched by starting with any song and 
immediately finding the closest songs. A more enhanced search combines a similarity 
search with a descriptor search and the descriptor adjustment step described above. 
The steps are: 

5 1a. Find a list of likeness matches, including some that are somewhat different 

lb. Present only the acceptable likeness songs, 

2. When a search adjusted by a scalar descriptor is requested, rank the entire 
list of likeness matches (1a) by the descriptor to be adjusted. 

3. Present a new list based on the adjusted values with the best matches at the 
10 top. 



This means that the likeness list (la.) compiled for the original search is much broader 
than the list displayed for the user(1b), and includes songs that are less similar to the 
B initial target song than would be tolerated by the listener. These poorer likeness 
5 matches lie below the presentability threshold for the initial target 

m 
m 
o 
m 
m 

s 

o 
m 
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at is claimed is: 



A method for building a computational model of human perception of a descriptor 
of music, comprising: 

a) extracting from each of at least 5 electronic representations of musical 
recordings at least two numeric parameters; 

b) for each recording, combining the numeric parameters with a weighting 
for each parameter to compute a single number representing the 
descriptor for that recording; 

c) adjusting the weightings for the parameters to find a set of weightings 
where each computed descriptor for each recoitiing most closely matches 
perceptions reported for the recording by one or more human listeners. 

A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 1. 

A method for generating a data record associated with a music recording, the 
record comprising two or more scalar descriptors, each descriptor numeri^lly 
describing the recording of music with which the data record is associated, 
comprising: 

a) extracting from an electronic representation of the recording of music at 
least two numeric parameters; 

b) combining the numeric parameters with a weighting for each parameter to 
compute a single number representing the descriptor for that recording, 
where the weightings were previously determined by: 

c) extracting from an electronic representation of each of at least 5 
musical recordings the same at least two numeric parameters: 

d) for each recording, combinir^ the numeric parameters with a 
weighting for each parameter to compute a single number 
representing the descriptor for that recording; 
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4. 



5, 



e) adjusting the weightings for the parameters to find a set of 

weightings where each computed descriptor for each recording 
most closely matches perceptions reported for the recording by 
one or more human listeners. 

A computer readable medium containing a computer program which causes a 
computer to perfonm the method of claim 3. 

A computer readable medium containing a computer extracted data record 
associated with a music recording, the record comprising: 

two or more scalar desaiptors, each descriptor numerically describing the 
1 0 recording of music with which the data record Is associated, where each 

descriptor was generated by: 

^) extracting from an electronic representation of the recording of music at 
,g least two numeric parameters; 

^ b) combining the numeric parameters with a weighting for each parameter to 

'^^^ compute a single number representing the descriptor for that recording, 

m ynfhere the weightings were previously determined by 

m 

J, c) extracting from an electronic representation of each of at least 5 

S musical recordings the same at least two numeric parameters; 

^) ®3ch recording, combining the numeric parameters with a 

weighting for each parameter to compute a single number 
representing the descriptor for that recording; 

e) adjusting the weightings for the parameters to find a set of 

weightings where each computed descriptor for each recording 
most closely matches perceptions reported for the recording by 
25 one or more human listeners. 

6. A method for searching a database of data records associated with music 
recordings to find a desired recording, comprising: 

a) identifying a comparison data record associated with a music recording in 
a computer readable database containing a plurality of data records, each 
associated with a music recording, the data records each comprising two 
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or more scalar descriptors, each descriptor numerically describing the 
recording of music with which the data record is assodated, where each 
descriptor was generated by: 

1) extracting from an electronic representation of the recording of 
5 music at least two numeric parameters; 

2) combining the numeric parameters with a weighting for each 
parameter to compute a single number representing the descriptor 
for that recording, where the weightings were previously 
determined by: 

° 3) extracting from an electronic representation of each of at least 5 

musical recordings the same at least two numeric parameters; 

4) for each recording, combining the numeric parameters with a 
Ig weighting for each parameter to compute a single number 

representing the descriptor for that recording; 

ff*"*^ 5) adjusting the weightings for the parameters to find a set of 
0 

gg weightings where each computed descriptor for each recording 

^ most closely matches perceptions reported for the recording by 

Q one or more human Usteners; and 

llj ^) searching the database to find data reconJs with descriptors that are 

1^20 similar to the descriptors of the comparison record. 

Q 7. The method of claim 6 further including, prior to searching the database, 

specifying that one of the descriptors of the comparison data record should be 
adjusted with an increase or a decrease, and the searching step is based on the 
descriptors of the comparison data record as adjusted. 

25 8. A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 6. 

9. A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 7. 

1 0. A method for building a computational model of human perception of likeness 
30 between musical recordings, comprising: 
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• a) extracting from each of at least 5 electronic representations of musical 
recordings at least two numeric parameters; 

b) receiving from one or more human listeners who compare pairs of the 
musical recordings an indication of the human's perception of lil^eness for 

5 each compared pair of recordings; 

c) for each compared pair of the recordings, comparing each numeric 
parameter of one recording in the pair with the corresponding parameter 
of the second recording of the pair using an algorithm which produces a 
parameter comparison number representing the parameter comparison; 

10 d) for each compared pair of the recordings, combining the parameter 

comparison numbers with a weighting for each parameter comparison 
number to compute a single difference number representing the 
Q difference between the two recordings of the pain 

q 

Ul e) adjusting the weightings for the comparison numbers to find a set of 

^15 weightings where each computed difference number for each pair of 

O recordings most closely matches perceptions reported for the pair of 

1^ recordings by the one or more human listeners. 

^ 11. The method of claim 10 where the algorithm includes subtraction of parameter 
values. 

|.^20 12. The method of daim 10 where the algorithm includes computing a conrelation 
Q between parameter values. 

.13. The method of claim 10 where, prior to the step of comparing the numeric 
parameters: 

a) the parameters for each recording are combined with a weighting for each 
25 parameter to compute a single number representing a descriptor for that 

recording, where 

b) the weightings were previously determined by adjusting the weightings to 
find a set of weightings where each computed descriptor for each 
recording most closely matches perceptions reported for the recording by 

30 one or more human listeners, and 
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c) the descriptors are then used in the step of comparing the numeric 
parameters In place of the parameters. 

14. A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 10. 

5 15. A computer readable medium containing a computer program which causes a 
computer to periderm the method of claim 11. 

16. A computer readable medium containing a computer program which causes a 
computer to perform the method of daim 12. 

17. A computer readable medium containing a computer program which causes a 
1 0 computer to perform the method of claim 1 3. 

18. A method for creating a database of differences between music recordings, 
comprising: 



a) associating an Identifier with each recording of a plurality of music 
Ifl recordings; 

m 

b) extracting from each recording of the plurality or recordings at least two 

ffl 

m numeric parameters; 



Q c) computing from the extracted parameters for each of a plurality of pairs of 

jP the recordings a number which represents the difference between the 

jjL recordings of the pair; and 

H 

i5 20 d) assembling the computed difference numbers Into a database where each 

computed difference is associated with the identifier for each of the two 
recordings from which the difference was computed. 

19 The method of claim 18 where the computing step includes subtraction of 
parameter values. 

25 20 The method of claim 1 8 where the computing step includes computing a 
correlation between parameter values, 

21 The method of claim 18 where the algorithm used to compute numbers which 

represent differences between pairs of recordings Is empirically developed based 
on perceptions reported by one or more human listeners. 
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^^2. A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 1 8. 

23. A computer readable medium containing a computer program which causes a 
computer to perfomn the method of claim 19. 

5 24. A computer readable medium containing a computer program which causes a 
computer to perform the method of claim 20. 

25. A computer readable medium containing a computer program which causes a 
computer to perfonm the method of claim 21. 

26 A method for finding a music recording which is perceived by humans to be lilce 
1 0 another music recording, comprising: 

a) receiving a specification of a target music recording; and 

Q t>) searching a database containing computed difference numbers between 

the target recording and a plurality of other recordings for those 
recordings which have a small computed difference number fn^m the 
p 1 6 target music recording. 

^ 27. The method of claim 26 where the database is created by; 

□ 3) associating an identifier with each recording of a plurality of music 

P recordings; 



m 



iV 
u 

§20 numeric parameters; 



b) extracting from each recording of the plurality or recordings at least two 



c) computing from the extracted parameters for pairs of the recordings a 
number which represents the difference between the recordings of the 

pair; and 

d) assembling the computed difference numbers into a database where each 
25 computed difference is associated with the identifier for each of the two 

recordings fi'om which the difference was computed. 

28. The method of claim 27 where the step of computing numbers which represent 
differences between pairs of recordings is empirically developed based on 
perceptions of one or more human listeners. 
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The method of claim 28 where the step of computing a number which represents 
the difference between the recordings of a pair of recordings includes the 
intermediate steps of: 

a) combining the parameters for each recording with a weighting for each 
5 parameter to compute a single number representing a descriptor for that 

recording, where 

b) the weightings were previously determined by adjusting the weightings to 
find a set of weightings where each computed descriptor for each 
recording most closely matches perceptions reported for the recording by 

'1 0 one or more human listeners, and 

c) the descriptors are then used in place of the parameters to compute a 
number which represents the difference between the recordings of the 

O pair. 

q 

m 30. A computer readable medium containing a computer program which causes a 

pis computer to perform the method of claim 26. 

Q 

eg 31. A computer readable medium containing a computer program which causes a 
W computer to perfonn the method of claim 27 

a 

0 32. A computer readable medium containing a computer program whicti causes a 
jy computer to perform the metliod of claim 28. 

□20 33. A computer readable medium containing a computer program whicti causes a 
^ computer to perfonn the method of claim 29. 
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^ ABSTRACT 

A method for characterizing a musical recording as a set of scalar descriptors, 
each of which fs based on human perception. A group of people listens to a large 
5 number of musical recordings and assigns to each one many scalar values, each value 
describing a characteristic of the music as Judged by the human listeners. Typical scalar 
values include energy level, happiness, danceability, melodicness. tempo, and anger. 
Each of the pieces of music judged by the listeners is then computationally processed to 
extract a large number of parameters which characterize the electronic signal within the 
10 recording. Algorithms are empirically generated which correlate the extracted 

parameters with the judgments based on human perception to build a model for each of 
the scalars of human perception. These models can then be applied to other music 
which has not been judged by the group of listeners to give to each piece of music a set 
O of scalar values based on human perception. The set of scalar values can be used to 
m15 find other pieces that sound similar to humans or vary in a dimension of one of the 
iH scalars. 
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PATENT 

DECLARATION AND POWER OF ATTORNEY 
IN PATENT APPLICATION 



Attorney Docket No.: 1709.1 -a 



As a below named inventor, I hereby declare: 



My residence, post office address and citizenship are as stated below next to my 
name. 



I believe that I am the original, first and sole inventor (if only one name is listed 
below) or a joint inventor (if plural inventors are listed below) of the subject matter 
that Is claimed and for which a patent is sought on the invention entitled: 

MUSIC SEARCHIIVIG METHODS BASFH OM HUMAM PFRHRPTIOM 

in the specification of which 

m 

Q O is attached hereto. 

IS 

jjl I hereby state that I have reviewed and understand the contents of the above- 

H identified specification, including the claims, as amended by any amendment 

O referred to above. 

p I acknowledge the duty to disclose information which is material to patentability as 

g defined In Title 37. Code of Federal Regulations, Section 1 .56. 

Q . I hereby claim the benefit under Title 35. United States Code Section 1 1 9(e) of any 
United States provisional application(s) listed below. 

Application No, Filing nata 

60/153 . 768 Septembftr14, 1QQQ 



I hereby appoint the attorneys associated with Customer No. 000996 to prosecute 
this application and to transact all business in the United States Patent and 
Trademark Office connected therewith. Address all con-espondence and phone 
calls to: 

Jeffrey T. Haley 
GRAYBEAL JACKSON HALEY LLP 
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I 1 55 - 1 08th Avenue NE. Suite 350 

■ Bellevue. WA 98004-5901 USA 

Telephone (425) 455-5575 
Facsimile (425) 455-1046 

I hereby further declare that all statements made herein of my own knowledge are 
true and that all statements made on Information and belief are believed to be true; 
and further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or Imprisonment, or both 
under Section 1 001 of Title 1 8 of the United States Code, and that such willful false 
statements may jeopardize the validity of the application or any patent issued 
thereon. 



Maxwell.! Wi>\\<^ LU£ 

Full Name of Inventor Citizenship 

6817 - 44"* AvenuR ISInrthfiafit. .qpattlp \A/a qhinqfnn Qft j j ff 

Residence 



jjj Inventor's Signature Date 

m . 

m 

Q 



David Walter 



USA 



m Full Name of Inventor Citizenship 

» 

a 7634 Hoillster Avfi ii:^F^'\ Go\9Ar CA P.?117 



t£ Residence 

m 



O Inventor's Signature Date 

B 



NavdeepS, Dhillon ij<^a 

Full Name of Inventor Citizenship 



8011 - 29 Avenue Northwest. SeattiA, W ashtn^tnn QR^']J 
Residence 



jsjflence y/ 

_ 

fnventor's Signature Date 
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155 - 108th Avenue ^JE, Suite 350 
Bellevue, WA 98004-5901 USA 
Telephone (425) 455-5575 
Facsimile (425) 455-1046 

I hereby further declare that ail statements made herein of my own l<nowledge are 
true and that ail statements made on infonrnatlon and belief are believed to be true; 
and further that these statements were made with the Icnowledge that willful false 
statements and the flic© so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code, and that such willful false 
statements may jeopardize the validity of the application or any patent issued 
thereon. 
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MaxwellJ. Welis 



Full Name of Inventor 

6817 - 44*^ Avenue Northeast. Seattle. Washinoton 9811 5 
Residence 



UK 



Citizenship 



Inventor's Signature 



Date 



David Waller 



Full Name of Inver 

763Hlollister Ave.. te351 . Goleta CA 931 1 7 



USA 




Navdeep S> Dhillon 



Full Name of Inventor 



USA 



Citizenship 



801 1 ' 29^ Avenue Northwest. Seattle, Washington 9811 7 
Residence 



Inventor's Signature 



Date 
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VERIFIED STATEMENT (DECLARATION) CLAIMING SMALL ENTITY STATUS 
(37 CFR 1.9(0 AND 1.27(c)) - SMALL BUSINESS CONCERN 

□ Applicant(s) or □ Patenteefe) : MaxwellJ. WbIIr. David Walter and NavriPPn «^ 

Dhillon 

□ Application orO Patent No.: Docket No. : 1702-1-.^ 

™e: MUSIC SFARCHING MFTHn ps BASFn OM HUMAN PFRnppTIONI 
I hereby declare that I am 



an official of the small business concern empowered to act on behalf of the 
concern identified below: 



I NAME OF CONCERN CantaMatriv Inn 

I ADDRESS OF CONCERN 110 11Q«- Av..>,n.. nf .<^..iteft<»n 

Q — : BellBVue Washinofon flfinn4 

09 

m 



I hereby declare that the above-identified small business qualifies as a small business 
O concern as defined in 13 CFR §121 .3-18, and reproduced in 37 CFR 1 9(d) for 
.p purposes of paying reduced fees under Section 41(a) and (b) of Title 35. United States 
d ? ' °* e"iP>oyees of the concern, including those of its affiliates 

J aoes not exceed 500 persons. For purposes of this statement. (1) the number of ' 
^ employe^ of the business concern is the average over the previous fiscal year of the 
concern of the persons employed on a full-time, part-time or temporary basis during 
each of the pay penods of the fiscal year, and (2) concerns are affiliates of each other 
When either, directly or indirectly, one concern controls or has the power to control the 
other, or a third party of parties controls or has the power to control both. 

I hereby declare that rights under contract or law have been conveyed to and remain 
with the small business concem identified above with regard to the invention descnl)ed 
in 

^ the specification filed herewith with the title as listed above. 



If the rights held by the above-identified small business concern are not exclusive, each 
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.which would not qualify as a small business concern under 37 CFR 1 9(d) or a 
f nonprofit organization under 37 CFR 1 .9(e). 

*NOTE: Separate verified statements are required from eacti named person 
Srintnje°s ^(3Sfr^i*^27^^ ^° ^ invention averring to their statuses 



FULL NAME. 



ADDRESS. 



□ individual □ Small Business Concern O Nonprofit Organization 



FULL NAME 



ADDRESS . 

□ individual □ Small Business Concern □ Nonprofit Organization 



«s 

m 
m 
a 
m 
m 



fU 

hi 
□ 



FULL NAME. 



ADDRESS. 



□ individual □ Small Business Concern □ Nonprofit Organization 

I ackriowiedge the duty to file, in this application or patent, notiflcalion of any change 
in status resulting in loss of entitlement to small entity status prior to paying, or at ^e 
fame of paying, the earliest of the issue fee or any maintenance fee due after the 
date on which status as a small entity is no longer appropriate. (37 CFR 1 .28(b)). 

I hereby declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true- and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both 
under Section 1001 of Title 18of the United States Code, and that such willful false 
stetemente mfV Jeopardize the validity of the applicatfon, any patent issuing thereon, 
or any patent to which this verified statement Is directed. 



MaxwBll.i w/«.ife 



NAME OF PERSON SIGNING 

TITLE C-HiSt^ ^S:cHA>0'.'>O-y Of^fi-icfcdt Ch fit t/t^fi> 



SIGNATURE. 



.DATE. 
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